11 research outputs found
Razvoj akustiÄkog modela hrvatskog jezika pomoÄu alata HTK
Paper presents development of the acoustic model for Croatian language for automatic speech recognition (ASR). Continuous speech recognition is performed by means of the Hidden Markov Models (HMM) implemented in the HMM Toolkit (HTK). In order to adjust the HTK to the native language a novel algorithm for Croatian language transcription (CLT) has been developed. It is based on phonetic assimilation rules that are applied within uttered words. Phonetic questions for state tying of different triphone models have also been developed. The automated system for training and evaluation of acoustic models has been developed and integrated with the new graphical user interface (GUI). Targeted applications of this ASR system are stress inoculation training (SIT) and virtual reality exposure therapy (VRET). Adaptability of the model to a closed set of speakers is important for such applications and this paper investigates the applicability of the HTK tool for typical scenarios. Robustness of the tool to a new language was tested in matched conditions by a parallel training of an English model that was used as a baseline. Ten native Croatian speakers participated in experiments. Encouraging results were achieved and reported with the developed model for Croatian language.Rad opisuje razvoj akustiÄkog modela hrvatskog jezika za potrebe sustava za automatsko prepoznavanje govora. Prepoznavanje prirodnog spojenog izgovora ostvaruje se koriÅ”tenjem skrivenih Markovljevih modela (HMM) u okviru alata HTK. U svrhu prilagodbe ovog alata na hrvatski jezik razvijen je novi algoritam za automatsku fonetsku transkripciju hrvatskih rijeÄi. Zasniva se na naÄelu fonetske asimilacije unutar izgovorenih rijeÄi. Razvijen je i skup fonetskih pitanja koji se koristi za klasifikaciju prilikom udruživanja trifonskih modela sliÄnih glasova. Razvijena je automatizirana aplikacija za gradnju i evaluaciju akustiÄkih modela, integrirana s novo razvijenim grafiÄkim suÄeljem. Primjene ovog sustava za prepoznavanje su trening s doziranim izlaganjem stresu (SIT) i terapija izlaganjem primjenom virtualne stvarnosti (VRET). Prilagodljivost akustiÄkog modela na zatvoren skup govornika vrlo je važna za takve primjene, pa se u radu istražuje primjenjivost alata HTK u tipiÄnim scenarijima. Robusnost alata na promjenu jezika istražuje se uparenim treniranjem i evaluacijom ekvivalentnog modela engleskog jezika u jednakim uvjetima. U eksperimentima je sudjelovalo deset izvornih hrvatskih govornika. Ostvareni rezultati za hrvatski jezik prikazani u radu pokazuju zadovoljavajuÄa svojstva razvijenog akustiÄkog modela hrvatskog jezika
Croatian Emotional Speech Analyses on a Basis of Acoustic and Linguistic Features
Acoustic and linguistic speech features are used for emotional state estimation of utterances collected within the Croatian emotional speech corpus. Analyses are performed for the classification of 5 discrete emotions, i.e. happiness, sadness, fear, anger and neutral state, as well as for the estimation of two emotional dimensions: valence and arousal. Acoustic and linguistic cues of emotional speech are analyzed separately, and are also combined in two types of fusion: a feature level fusion and a decision level fusion. The Random Forest method is used for all analyses, with the combination of Info Gain feature selection method for classification tasks and Univariate Linear Regression method for regression tasks. The main hypothesis is confirmed, i.e. an increase of classification accuracy is achieved in the cases of fusion analyses (compared with separate acoustic or linguistic feature sets usages), as well as a decrease of root mean squared error when estimating emotional dimensions. Most of other hypothesis are also confirmed, which suggest that acoustic and linguistic cues of Croatian language are showing similar behavior as other languages in the context of emotional impact on speech
An Application of Fuzzy Inductive Logic Programming for Textual Entailment and Value Mining
The aim of this preliminary report is to give an overview of textual entailment in natural language processing (NLP), to present our approach to research and to explain the possible applications for such a system. Our system presupposes several modules, namely the sentiment analysis module, the anaphora resolution module, the named entity recognition module and the relationship extraction module. State-of-the-art modules will be used but no amount of research will go into this. The research focuses on the main module that extracts background knowledge from the extracted relationships via resolution and inverse resolution (inductive logic programming). The last part focuses on possible economic applications of our research
Metodologija estimacije emocionalnih stanja na temelju akustiÄkih znaÄajki govora
U novije vrijeme se sve veÄa pažnja posveÄuje problematici raÄunalne estimacije emocionalnog stanja iz Äovjekovog glasa, prvenstveno u kontekstu razvoja sustava za inteligentnu interakciju izmeÄu Äovjeka i raÄunala. U radu je opisana metodologija estimacije po koracima: izvlaÄenje akustiÄkih znaÄajki emocionalnog govora, redukcija prostora znaÄajki te estimacija emocionalnih stanja na temelju neke od metoda strojnog uÄenja. Emocije se tipiÄno reprezentiraju kao diskretna stanja, poput sreÄe, ljutnje, straha ili gaÄenja, ili kao dimenzije, najÄeÅ”Äe kao razine ugode i pobuÄenosti. Pritom se za raspoznavanje diskretnih emocija koriste klasifikacijske metode, a za estimaciju dimenzijskih veliÄina emocija regresijske. U radu je dan pregled state-of-the-art akustiÄkih znaÄajki za prepoznavanje emocija te su prikazani rezultati relevantnih radova na ovom podruÄju
Metodologija estimacije emocionalnih stanja na temelju akustiÄkih znaÄajki govora
U novije vrijeme se sve veÄa pažnja posveÄuje problematici raÄunalne estimacije emocionalnog stanja iz Äovjekovog glasa, prvenstveno u kontekstu razvoja sustava za inteligentnu interakciju izmeÄu Äovjeka i raÄunala. U radu je opisana metodologija estimacije po koracima: izvlaÄenje akustiÄkih znaÄajki emocionalnog govora, redukcija prostora znaÄajki te estimacija emocionalnih stanja na temelju neke od metoda strojnog uÄenja. Emocije se tipiÄno reprezentiraju kao diskretna stanja, poput sreÄe, ljutnje, straha ili gaÄenja, ili kao dimenzije, najÄeÅ”Äe kao razine ugode i pobuÄenosti. Pritom se za raspoznavanje diskretnih emocija koriste klasifikacijske metode, a za estimaciju dimenzijskih veliÄina emocija regresijske. U radu je dan pregled state-of-the-art akustiÄkih znaÄajki za prepoznavanje emocija te su prikazani rezultati relevantnih radova na ovom podruÄju
COMPUTER-AIDED PSYCHOTHERAPY BASED ON MULTIMODAL ELICITATION, ESTIMATION AND REGULATION OF EMOTION
Contemporary psychiatry is looking at affective sciences to understand human behavior, cognition and the mind in health and
disease. Since it has been recognized that emotions have a pivotal role for the human mind, an ever increasing number of
laboratories and research centers are interested in affective sciences, affective neuroscience, affective psychology and affective
psychopathology. Therefore, this paper presents multidisciplinary research results of Laboratory for Interactive Simulation System
at Faculty of Electrical Engineering and Computing, University of Zagreb in the stress resilience. Patientās distortion in emotional
processing of multimodal input stimuli is predominantly consequence of his/her cognitive deficit which is result of their individual
mental health disorders. These emotional distortions in patientās multimodal physiological, facial, acoustic, and linguistic features
related to presented stimulation can be used as indicator of patientās mental illness. Real-time processing and analysis of patientās
multimodal response related to annotated input stimuli is based on appropriate machine learning methods from computer science.
Comprehensive longitudinal multimodal analysis of patientās emotion, mood, feelings, attention, motivation, decision-making, and
working memory in synchronization with multimodal stimuli provides extremely valuable big database for data mining, machine
learning and machine reasoning. Presented multimedia stimuli sequence includes personalized images, movies and sounds, as well
as semantically congruent narratives. Simultaneously, with stimuli presentation patient provides subjective emotional ratings of
presented stimuli in terms of subjective units of discomfort/distress, discrete emotions, or valence and arousal. These subjective
emotional ratings of input stimuli and corresponding physiological, speech, and facial output features provides enough information
for evaluation of patientās cognitive appraisal deficit. Aggregated real-time visualization of this information provides valuable
assistance in patient mental state diagnostics enabling therapist deeper and broader insights into dynamics and progress of the
psychotherapy
Emotional state estimation based on data mining of acoustic speech features
Estimacija emocionalnih stanja iz govora može imati važnu ulogu u mnogim podruÄjima. U okviru ove doktorske disertacije realiziran je sustav za estimaciju emocionalnih stanja, temeljen na akustiÄkim znaÄajkama govornog signala, koji svoju primjenu može naÄi u psihoterapiji te u postupcima selekcije i obuke kandidata za stresne i odgovorne operacije. Zbog takvog potencijala je poseban naglasak stavljen na estimaciju govora pod stresom, kao i na pobuÄivanje ispitanika prepadnim, odnosno startle pobudama. Istražena je neurobioloÅ”ka podloga nastanka emocija kao i utjecaj emocija na bioloÅ”ke mehanizme za produkciju govora, a posljediÄno i na pojedine akustiÄke parametre i znaÄajke iz glasa. Predložene su mjere perturbacije glasa, odnosno znaÄajke utjecaja limbiÄkih struktura na poremeÄaje koordinacije antagonistiÄkog procesa titranja glasnica, koje su rezultirale znaÄajnom razluÄivosti na razinu stresa u glasu. Pritom je ustavovljena i njihova robusnost na voljne komponente govora, konkretno dinamike fundamentalne frekvencije tijekom izgovora, gdje se konvencionalne perturbacijske mjere (jitter) nisu pokazale toliko uspjeÅ”ne. Analiziran je utjecaj intenzivnih zvuÄnih pobuda impulsnog oblika, odnosno startle pobuda, na promjene fundamentalne frekvencije glasa. Takozvane fear-potentiated startle reakcije nalaze veliku primjenu u dijagnostici posttraumatskog stresnog poremeÄaja, odnosno u paradigmama kondicioniranja i ekstinkcije straha. Kao konvencionalna mjera za predikciju startle reakcija danas se koristi elektromiografija orbicularis oculi miÅ”iÄa, to jest analiza treptaja oka. U okviru ove disertacije izvrÅ”ena je usporedna analiza odziva na fundamentalnoj frekvenciji i odziva na orbicularis oculi miÅ”iÄu te su ustanovljene konzistentnosti i sliÄna svojstva odziva. Nadalje, predloženo je unaprjeÄenje konvencionalne arhitekture sustava za estimaciju dimenzijskih emocija, ugode i pobuÄenosti, s a priori znanjem o povezanosti tih emocija. Analizama je potvrÄeno unaprjeÄenje toÄnosti estimacije koriÅ”tenjem takve arhitekture.This doctoral thesis is the result of research on the project āAdaptive Control of Scenarios in VR Therapy of PTSDā, which aims to develop collaborative and intelligent agent that, as a decision-making support, could be applicable in a number of areas such as prediction, selection, diagnosis and the treatment of mental disorders, especially those caused by stress. The thesis explores the estimation problem of emotional states, stress and acoustic startle responses based on acoustic speech features. Emphasis is placed on evaluating the features using statistical analysis methods in the context of the aforementioned problems. New voice perturbation features are proposed and evaluated in this thesis that describe the impact of limbic structures on neural regions responsible for coordinating the antagonistic process of the vocal folds vibrations. A comparative analysis of changes in speech fundamental frequency (F0) with electromyographic (EMG) response of the orbicularis oculi muscle was performed. This thesis proposes improvement of the conventional system architecture for estimating emotional dimensions, valence and arousal, with a priori knowledge about the relation between these two emotional dimensions. The introductory chapter defines the domains, motivation and objectives of the research, citing the inherent interdisciplinarity of the research field. The scientific contributions and the structure of the dissertation are also defined in this chapter. In the second chapter, neurobiological processes are described through which emotions impact on speech production mechanisms. The influence of emotions on respiration, phonation and articulation mechanisms of speech is explored. Special attention is given to the internal muscles of the larynx, i.e. phonation mechanisms, which due to their sensitive structures are most vulnerable to the impact of emotions. The acoustic speech features that are commonly used for estimation of emotional states and stress are described in the third chapter. Furthermore, decomposition of speech fundamental frequency is proposed, where components selectively include specific neurobiological processes of emotions. Speech perturbation features are proposed that describe the time and amplitude aspect of the disturbance in the vocal folds oscillation, which is a consequence of the limbic system influence on the cerebellum and brainstem. The proposed features are validated using the example of artificially generated speech perturbations and in terms of speech under stress. In most cases, the proposed features showed statistically significant difference to the level of speech perturbations and the level of stress. Furthermore, their satisfactory robustness was shown to the impact of the voluntary component in pronunciation, in particular the dynamics of the fundamental frequency, which is their main benefit over conventional speech perturbation measures (e.g. jitter measures). In the fourth chapter, F0 features are validated in the context of the acoustic startle response. Features like peak value, peak time, duration etc. are validated depending on the parameter changes of the startle stimulus, i.e. intensity, duration, rise time and spectral characteristics of the stimulus, as well as depending on the existence and intensity of the startle response. A comparative analysis is performed between F0 response features and EMG features of the orbicularis oculi muscle response (eyeblink), which is considered the reference measure for startle reaction analysis. Analyses have shown similar behavior of F0 and EMG responses when changing the intensity of the startle stimulus. In both cases the highest statistically significant difference is achieved for the response peak value. A significant increasing trend was observed in peak values of F0 and EMG responses with an increase in the stimulus intensity at higher levels of stimulus intensity. In the fifth chapter, the methodology of emotional state estimation based on acoustic speech features is described, which is conventionally performed through four sequential processes: speech signal processing with the extraction of acoustic measures; feature calculation from acoustic measures; reduction of a feature space; and estimation of emotional states using machine learning methods. An upgrade of conventional architecture for estimating emotion dimensions, valence and arousal, which is based on a priori relationships between the two dimensions, is proposed in this thesis. A priori model is applied on the conventional estimation process in order to shift estimation results in valence-arousal space toward more probable values, according to the level of the estimation uncertainty. Different approaches to a priori knowledge modeling have been undertaken: (a) single integral model over valence-arousal space, and (b) integration of multiple models that represent different discrete emotions in the valence-arousal space, specifically happiness, sadness, fear, anger and neutral state. Building and validation of the emotional state estimation system are performed using utterances from the Croatian emotional speech corpus, which was collected and annotated in collaboration with the University of Zagreb, Faculty of Humanities and Social Sciences. In the sixth chapter, validation of machine learning methods, specifically support vector machines and random forest, is performed in the cases of emotional states, stress and startle response estimation. In this context, the improvements proposed in the thesis were compared with conventional approaches from the literature. The results showed the justification for introducing new perturbation speech features for classification of speech under stress, applying F0 features for startle response analysis and proposing the enhanced method for estimation of emotional states. The last chapter concludes the doctoral thesis and provides suggestions for future related research. Specific applications of the proposed methods are also discussed
Emotional state estimation based on data mining of acoustic speech features
Estimacija emocionalnih stanja iz govora može imati važnu ulogu u mnogim podruÄjima. U okviru ove doktorske disertacije realiziran je sustav za estimaciju emocionalnih stanja, temeljen na akustiÄkim znaÄajkama govornog signala, koji svoju primjenu može naÄi u psihoterapiji te u postupcima selekcije i obuke kandidata za stresne i odgovorne operacije. Zbog takvog potencijala je poseban naglasak stavljen na estimaciju govora pod stresom, kao i na pobuÄivanje ispitanika prepadnim, odnosno startle pobudama. Istražena je neurobioloÅ”ka podloga nastanka emocija kao i utjecaj emocija na bioloÅ”ke mehanizme za produkciju govora, a posljediÄno i na pojedine akustiÄke parametre i znaÄajke iz glasa. Predložene su mjere perturbacije glasa, odnosno znaÄajke utjecaja limbiÄkih struktura na poremeÄaje koordinacije antagonistiÄkog procesa titranja glasnica, koje su rezultirale znaÄajnom razluÄivosti na razinu stresa u glasu. Pritom je ustavovljena i njihova robusnost na voljne komponente govora, konkretno dinamike fundamentalne frekvencije tijekom izgovora, gdje se konvencionalne perturbacijske mjere (jitter) nisu pokazale toliko uspjeÅ”ne. Analiziran je utjecaj intenzivnih zvuÄnih pobuda impulsnog oblika, odnosno startle pobuda, na promjene fundamentalne frekvencije glasa. Takozvane fear-potentiated startle reakcije nalaze veliku primjenu u dijagnostici posttraumatskog stresnog poremeÄaja, odnosno u paradigmama kondicioniranja i ekstinkcije straha. Kao konvencionalna mjera za predikciju startle reakcija danas se koristi elektromiografija orbicularis oculi miÅ”iÄa, to jest analiza treptaja oka. U okviru ove disertacije izvrÅ”ena je usporedna analiza odziva na fundamentalnoj frekvenciji i odziva na orbicularis oculi miÅ”iÄu te su ustanovljene konzistentnosti i sliÄna svojstva odziva. Nadalje, predloženo je unaprjeÄenje konvencionalne arhitekture sustava za estimaciju dimenzijskih emocija, ugode i pobuÄenosti, s a priori znanjem o povezanosti tih emocija. Analizama je potvrÄeno unaprjeÄenje toÄnosti estimacije koriÅ”tenjem takve arhitekture.This doctoral thesis is the result of research on the project āAdaptive Control of Scenarios in VR Therapy of PTSDā, which aims to develop collaborative and intelligent agent that, as a decision-making support, could be applicable in a number of areas such as prediction, selection, diagnosis and the treatment of mental disorders, especially those caused by stress. The thesis explores the estimation problem of emotional states, stress and acoustic startle responses based on acoustic speech features. Emphasis is placed on evaluating the features using statistical analysis methods in the context of the aforementioned problems. New voice perturbation features are proposed and evaluated in this thesis that describe the impact of limbic structures on neural regions responsible for coordinating the antagonistic process of the vocal folds vibrations. A comparative analysis of changes in speech fundamental frequency (F0) with electromyographic (EMG) response of the orbicularis oculi muscle was performed. This thesis proposes improvement of the conventional system architecture for estimating emotional dimensions, valence and arousal, with a priori knowledge about the relation between these two emotional dimensions. The introductory chapter defines the domains, motivation and objectives of the research, citing the inherent interdisciplinarity of the research field. The scientific contributions and the structure of the dissertation are also defined in this chapter. In the second chapter, neurobiological processes are described through which emotions impact on speech production mechanisms. The influence of emotions on respiration, phonation and articulation mechanisms of speech is explored. Special attention is given to the internal muscles of the larynx, i.e. phonation mechanisms, which due to their sensitive structures are most vulnerable to the impact of emotions. The acoustic speech features that are commonly used for estimation of emotional states and stress are described in the third chapter. Furthermore, decomposition of speech fundamental frequency is proposed, where components selectively include specific neurobiological processes of emotions. Speech perturbation features are proposed that describe the time and amplitude aspect of the disturbance in the vocal folds oscillation, which is a consequence of the limbic system influence on the cerebellum and brainstem. The proposed features are validated using the example of artificially generated speech perturbations and in terms of speech under stress. In most cases, the proposed features showed statistically significant difference to the level of speech perturbations and the level of stress. Furthermore, their satisfactory robustness was shown to the impact of the voluntary component in pronunciation, in particular the dynamics of the fundamental frequency, which is their main benefit over conventional speech perturbation measures (e.g. jitter measures). In the fourth chapter, F0 features are validated in the context of the acoustic startle response. Features like peak value, peak time, duration etc. are validated depending on the parameter changes of the startle stimulus, i.e. intensity, duration, rise time and spectral characteristics of the stimulus, as well as depending on the existence and intensity of the startle response. A comparative analysis is performed between F0 response features and EMG features of the orbicularis oculi muscle response (eyeblink), which is considered the reference measure for startle reaction analysis. Analyses have shown similar behavior of F0 and EMG responses when changing the intensity of the startle stimulus. In both cases the highest statistically significant difference is achieved for the response peak value. A significant increasing trend was observed in peak values of F0 and EMG responses with an increase in the stimulus intensity at higher levels of stimulus intensity. In the fifth chapter, the methodology of emotional state estimation based on acoustic speech features is described, which is conventionally performed through four sequential processes: speech signal processing with the extraction of acoustic measures; feature calculation from acoustic measures; reduction of a feature space; and estimation of emotional states using machine learning methods. An upgrade of conventional architecture for estimating emotion dimensions, valence and arousal, which is based on a priori relationships between the two dimensions, is proposed in this thesis. A priori model is applied on the conventional estimation process in order to shift estimation results in valence-arousal space toward more probable values, according to the level of the estimation uncertainty. Different approaches to a priori knowledge modeling have been undertaken: (a) single integral model over valence-arousal space, and (b) integration of multiple models that represent different discrete emotions in the valence-arousal space, specifically happiness, sadness, fear, anger and neutral state. Building and validation of the emotional state estimation system are performed using utterances from the Croatian emotional speech corpus, which was collected and annotated in collaboration with the University of Zagreb, Faculty of Humanities and Social Sciences. In the sixth chapter, validation of machine learning methods, specifically support vector machines and random forest, is performed in the cases of emotional states, stress and startle response estimation. In this context, the improvements proposed in the thesis were compared with conventional approaches from the literature. The results showed the justification for introducing new perturbation speech features for classification of speech under stress, applying F0 features for startle response analysis and proposing the enhanced method for estimation of emotional states. The last chapter concludes the doctoral thesis and provides suggestions for future related research. Specific applications of the proposed methods are also discussed
Emotional state estimation based on data mining of acoustic speech features
Estimacija emocionalnih stanja iz govora može imati važnu ulogu u mnogim podruÄjima. U okviru ove doktorske disertacije realiziran je sustav za estimaciju emocionalnih stanja, temeljen na akustiÄkim znaÄajkama govornog signala, koji svoju primjenu može naÄi u psihoterapiji te u postupcima selekcije i obuke kandidata za stresne i odgovorne operacije. Zbog takvog potencijala je poseban naglasak stavljen na estimaciju govora pod stresom, kao i na pobuÄivanje ispitanika prepadnim, odnosno startle pobudama. Istražena je neurobioloÅ”ka podloga nastanka emocija kao i utjecaj emocija na bioloÅ”ke mehanizme za produkciju govora, a posljediÄno i na pojedine akustiÄke parametre i znaÄajke iz glasa. Predložene su mjere perturbacije glasa, odnosno znaÄajke utjecaja limbiÄkih struktura na poremeÄaje koordinacije antagonistiÄkog procesa titranja glasnica, koje su rezultirale znaÄajnom razluÄivosti na razinu stresa u glasu. Pritom je ustavovljena i njihova robusnost na voljne komponente govora, konkretno dinamike fundamentalne frekvencije tijekom izgovora, gdje se konvencionalne perturbacijske mjere (jitter) nisu pokazale toliko uspjeÅ”ne. Analiziran je utjecaj intenzivnih zvuÄnih pobuda impulsnog oblika, odnosno startle pobuda, na promjene fundamentalne frekvencije glasa. Takozvane fear-potentiated startle reakcije nalaze veliku primjenu u dijagnostici posttraumatskog stresnog poremeÄaja, odnosno u paradigmama kondicioniranja i ekstinkcije straha. Kao konvencionalna mjera za predikciju startle reakcija danas se koristi elektromiografija orbicularis oculi miÅ”iÄa, to jest analiza treptaja oka. U okviru ove disertacije izvrÅ”ena je usporedna analiza odziva na fundamentalnoj frekvenciji i odziva na orbicularis oculi miÅ”iÄu te su ustanovljene konzistentnosti i sliÄna svojstva odziva. Nadalje, predloženo je unaprjeÄenje konvencionalne arhitekture sustava za estimaciju dimenzijskih emocija, ugode i pobuÄenosti, s a priori znanjem o povezanosti tih emocija. Analizama je potvrÄeno unaprjeÄenje toÄnosti estimacije koriÅ”tenjem takve arhitekture.This doctoral thesis is the result of research on the project āAdaptive Control of Scenarios in VR Therapy of PTSDā, which aims to develop collaborative and intelligent agent that, as a decision-making support, could be applicable in a number of areas such as prediction, selection, diagnosis and the treatment of mental disorders, especially those caused by stress. The thesis explores the estimation problem of emotional states, stress and acoustic startle responses based on acoustic speech features. Emphasis is placed on evaluating the features using statistical analysis methods in the context of the aforementioned problems. New voice perturbation features are proposed and evaluated in this thesis that describe the impact of limbic structures on neural regions responsible for coordinating the antagonistic process of the vocal folds vibrations. A comparative analysis of changes in speech fundamental frequency (F0) with electromyographic (EMG) response of the orbicularis oculi muscle was performed. This thesis proposes improvement of the conventional system architecture for estimating emotional dimensions, valence and arousal, with a priori knowledge about the relation between these two emotional dimensions. The introductory chapter defines the domains, motivation and objectives of the research, citing the inherent interdisciplinarity of the research field. The scientific contributions and the structure of the dissertation are also defined in this chapter. In the second chapter, neurobiological processes are described through which emotions impact on speech production mechanisms. The influence of emotions on respiration, phonation and articulation mechanisms of speech is explored. Special attention is given to the internal muscles of the larynx, i.e. phonation mechanisms, which due to their sensitive structures are most vulnerable to the impact of emotions. The acoustic speech features that are commonly used for estimation of emotional states and stress are described in the third chapter. Furthermore, decomposition of speech fundamental frequency is proposed, where components selectively include specific neurobiological processes of emotions. Speech perturbation features are proposed that describe the time and amplitude aspect of the disturbance in the vocal folds oscillation, which is a consequence of the limbic system influence on the cerebellum and brainstem. The proposed features are validated using the example of artificially generated speech perturbations and in terms of speech under stress. In most cases, the proposed features showed statistically significant difference to the level of speech perturbations and the level of stress. Furthermore, their satisfactory robustness was shown to the impact of the voluntary component in pronunciation, in particular the dynamics of the fundamental frequency, which is their main benefit over conventional speech perturbation measures (e.g. jitter measures). In the fourth chapter, F0 features are validated in the context of the acoustic startle response. Features like peak value, peak time, duration etc. are validated depending on the parameter changes of the startle stimulus, i.e. intensity, duration, rise time and spectral characteristics of the stimulus, as well as depending on the existence and intensity of the startle response. A comparative analysis is performed between F0 response features and EMG features of the orbicularis oculi muscle response (eyeblink), which is considered the reference measure for startle reaction analysis. Analyses have shown similar behavior of F0 and EMG responses when changing the intensity of the startle stimulus. In both cases the highest statistically significant difference is achieved for the response peak value. A significant increasing trend was observed in peak values of F0 and EMG responses with an increase in the stimulus intensity at higher levels of stimulus intensity. In the fifth chapter, the methodology of emotional state estimation based on acoustic speech features is described, which is conventionally performed through four sequential processes: speech signal processing with the extraction of acoustic measures; feature calculation from acoustic measures; reduction of a feature space; and estimation of emotional states using machine learning methods. An upgrade of conventional architecture for estimating emotion dimensions, valence and arousal, which is based on a priori relationships between the two dimensions, is proposed in this thesis. A priori model is applied on the conventional estimation process in order to shift estimation results in valence-arousal space toward more probable values, according to the level of the estimation uncertainty. Different approaches to a priori knowledge modeling have been undertaken: (a) single integral model over valence-arousal space, and (b) integration of multiple models that represent different discrete emotions in the valence-arousal space, specifically happiness, sadness, fear, anger and neutral state. Building and validation of the emotional state estimation system are performed using utterances from the Croatian emotional speech corpus, which was collected and annotated in collaboration with the University of Zagreb, Faculty of Humanities and Social Sciences. In the sixth chapter, validation of machine learning methods, specifically support vector machines and random forest, is performed in the cases of emotional states, stress and startle response estimation. In this context, the improvements proposed in the thesis were compared with conventional approaches from the literature. The results showed the justification for introducing new perturbation speech features for classification of speech under stress, applying F0 features for startle response analysis and proposing the enhanced method for estimation of emotional states. The last chapter concludes the doctoral thesis and provides suggestions for future related research. Specific applications of the proposed methods are also discussed
COMPUTER-AIDED PSYCHOTHERAPY BASED ON MULTIMODAL ELICITATION, ESTIMATION AND REGULATION OF EMOTION
Contemporary psychiatry is looking at affective sciences to understand human behavior, cognition and the mind in health and
disease. Since it has been recognized that emotions have a pivotal role for the human mind, an ever increasing number of
laboratories and research centers are interested in affective sciences, affective neuroscience, affective psychology and affective
psychopathology. Therefore, this paper presents multidisciplinary research results of Laboratory for Interactive Simulation System
at Faculty of Electrical Engineering and Computing, University of Zagreb in the stress resilience. Patientās distortion in emotional
processing of multimodal input stimuli is predominantly consequence of his/her cognitive deficit which is result of their individual
mental health disorders. These emotional distortions in patientās multimodal physiological, facial, acoustic, and linguistic features
related to presented stimulation can be used as indicator of patientās mental illness. Real-time processing and analysis of patientās
multimodal response related to annotated input stimuli is based on appropriate machine learning methods from computer science.
Comprehensive longitudinal multimodal analysis of patientās emotion, mood, feelings, attention, motivation, decision-making, and
working memory in synchronization with multimodal stimuli provides extremely valuable big database for data mining, machine
learning and machine reasoning. Presented multimedia stimuli sequence includes personalized images, movies and sounds, as well
as semantically congruent narratives. Simultaneously, with stimuli presentation patient provides subjective emotional ratings of
presented stimuli in terms of subjective units of discomfort/distress, discrete emotions, or valence and arousal. These subjective
emotional ratings of input stimuli and corresponding physiological, speech, and facial output features provides enough information
for evaluation of patientās cognitive appraisal deficit. Aggregated real-time visualization of this information provides valuable
assistance in patient mental state diagnostics enabling therapist deeper and broader insights into dynamics and progress of the
psychotherapy